Revisiting Again Document Length Hypotheses TREC 2004 Genomics Track Experiments at Patolis

نویسنده

  • Sumio Fujita
چکیده

The TREC-2004 Genomics track evaluation experiments at Patolis Corporation are described with a focus on the document length issues in different retrieval models such as TF*IDF or probabilistic language modeling approaches. In the genomics ad hoc retrieval task, combination of pseudo-relevance feedback and reference database feedback is applied. For the triage sub-task, we trained a SVM classifier using leave-one-out-cross-validation, and calibrated parameters to be optimal against the training set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Revisiting Document Length Hypotheses: NTCIR-4 CLIR and Patent Experiments at Patolis

NTCIR-4 experiments of CLIR J-J and Patent tasks, focusing on comparative studies of two testcollections and two retrieval approaches in view of document length hypotheses are described. TF*IDF outperformed the language modeling approach in the CLIR J-J task while two approaches performed similarly in the Patent task. Two different document length hypotheses behind two tasks/collections are ass...

متن کامل

RMIT University at TREC 2004

RMIT University participated in two tracks at TREC 2004: Terabyte and Genomics, both for the first time. This paper describes the techniques we applied and our experiments in both tracks, and discusses the results of the genomics track runs; the terabyte track results are unavailable at the time of manuscript submission. We also describe our new zettair search engine, in use for the first time ...

متن کامل

Enhancing Access to the Bibliome: The TREC Genomics Track

The growing amount of scientific discovery in genomics and related biomedical disciplines has led to a corresponding increase in the amount of on-line data and information. A new challenge for biomedical researchers has been how to access and manage this ever-increasing quantity of information. The Text Retrieval Conference (TREC) has implemented a Genomics Track to create an experimental envir...

متن کامل

TREC Genomics 2004

The TREC Genomics track started in 2003 as the first domain specific track of the Text Retrieval Competition. The aim of the track is to develop various IR tasks specific to the biomedical field. One task of the first year involved the retrieval of documents given a specific gene, while the second task required the extraction a brief description of gene function from documents. This year sees a...

متن کامل

Experience of Using SVM for the Triage Task in TREC 2004 Genomics Track

This paper reports our knowledge-ignorant machine learning approach to the triage task in TREC2004 genomics track, which is actually a text categorization problem. We applied Support Vector Machine (SVM) and found that information-gain based feature selection is helpful. Although we achieved decent performance in leave-one-out cross-validation experiments, the evaluation result on the test data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004